Online Learning for Latent Dirichlet Allocation

نویسندگان

  • Matthew D. Hoffman
  • David M. Blei
  • Francis R. Bach
چکیده

We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Online Learning for Latent Dirichlet Allocation

A major obstacle in using Latent Dirichlet Allocation (LDA) is the amount of time it takes for inference, especially for a dataset that starts out large and expands quickly, such as a corpus of blog posts or online news articles. Recent developments in distributed inference algorithms for LDA, as well as minibatchbased online learning algorithms have offered partial solutions for problem. In th...

متن کامل

Particle Filter Rejuvenation and Latent Dirichlet Allocation

Previous research has established several methods of online learning for latent Dirichlet allocation (LDA). However, streaming learning for LDA— allowing only one pass over the data and constant storage complexity—is not as well explored. We use reservoir sampling to reduce the storage complexity of a previously-studied online algorithm, namely the particle filter, to constant. We then show tha...

متن کامل

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling

We study parameter inference in large-scale latent variable models. We first propose an unified treatment of online inference for latent variable models from a non-canonical exponential family, and draw explicit links between several previously proposed frequentist or Bayesian methods. We then propose a novel inference method for the frequentist estimation of parameters, that adapts MCMC method...

متن کامل

Bipartite Graph for Topic Extraction

This article presents a bipartite graph propagation method to be applied to different tasks in the machine learning unsupervised domain, such as topic extraction and clustering. We introduce the objectives and hypothesis that motivate the use of graph based method, and we give the intuition of the proposed Bipartite Graph Propagation Algorithm. The contribution of this study is the development ...

متن کامل

Parameter Estimation for the Latent Dirichlet Allocation

We review three algorithms for parameter estimation of the Latent Dirichlet Allocation model: batch variational Bayesian inference, online variational Bayesian inference and inference using collapsed Gibbs sampling. We experimentally compare their time complexity and performance. We find that the online variational Bayesian inference converges faster than the other two inference techniques, wit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010